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Variational methods for parameter estimation are an active re- 
search area, potentially offering computationally tractable heuristics 
with theoretical performance bounds. We build on recent work that 
applies such methods to network data, and establish asymptotic nor- 
mality rates for parameter estimates of stochastic blockmodel data, 
by either maximum likelihood or variational estimation. The result 
also applies to various sub-models of the stochastic blockmodel found 
in the literature. 



1. Introduction. The analysis of network data is an open statistical 
problem, with many potential applications in the social sciences [13] and 
in biology [15]. In such applications, the models tend to pose both compu- 
tational and statistical challenges, in that neither their fitting method nor 
their large sample properties are well-understood. 

However, some results are becoming known for a model known as the 
stochastic blockmodel, which assumes that the network connections are 
explainable by a latent discrete class variable associated with each node. 
For this model, consistency has been shown for profile likelihood maximiza- 
tion [1], a spectral-clustering based method [16], and other methods as well 
[2, 6, 7, 8], under varying assumptions on the sparsity of the network and 
the number of classes. These results suggest that the model has reasonable 
statistical properties, and empirical experiments suggest that efficient ap- 
proximate methods may suffice to find the parameter estimates. However, 
formally there is no satisfactory inference theory for the behavior of clas- 
sical procedures such as maximum likelihood under the model, nor for any 
procedure which is computationally not potentially NP under worst-case 
analysis. 
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In this note, we establish both consistency and asymptotic normality of 
maximum likelihood estimation, and also of a variational approximation 
method, considering sparse models and restricted sub-models under similar 
assumptions as required in [1]. To some extent, we are following a pioneering 
paper of Celisse et. al. [5], in which the dense model was considered, and 
consistency was established for a subset of the parameters. 

2. Preliminaries. 



2.1. General graph models. We consider a class of latent variable mod- 
els for unlabeled graphs considered by various authors [11, 1, 4], which we 
descibe as follows. Let Z±, . . . , Z n be -Z-valued latent random variables, cor- 
responding to vertices 1, . . . , n, let ir be a distribution on Z, and let h be a 
symmetric map Z x Z — > [0, 1]. We define the complete graph model (CGM) 
for (Z, A), where Z = (Z±, . . . , Z n ) and A is the nxn symmetric 0-1 ad- 
jacency matrix of a graph, by its density with respect to an appropriate 
reference measure. 

Cn \ / n n 

n n h&z^ii-h&Zj)) 1 -^ 
i=l / \i=lj=i+l 

where we may interpret h(Zi, Zj) as P(edge|Z,, Zj). 

The graph model (GM) is defined by distribution g : {0, l} nxn — > [0,1], 
which satisfies g(A) = ¥(A; h,ir), in addition to the identity 



(9\ 9 JS. - Iff 



m,A) ■ 



Jo(Z,A) 

which holds for any go and /o corresponding to the same choice of h and ir. 

It is data from GM which we assume we observe. In [1], {Zi}^ =1 are i.i.d 
Uniform (0,1). In [10], they are a multivariate mixture of Gaussians with 
unknown parameters. If we make no restrictions on h, these models are 
equivalent. 

2.2. Stochastic Blockmodel. In this paper, we focus on stochastic block- 
models, in which Z is a discrete space {1, . . . , K}. As a result, we may sim- 
plify the GM model by letting ir be a discrete distribution over {1, . . . , K}, 
and by letting h be represented by a matrix H £ [0, l] KxK . 

We will consider parametric submodels of the blockmodel. Parameterized 
CGM and GM densities / and g for the blockmodel generically satisfy 

(n \ / n n 

n^(^) n n H^z^tx-H^z^t 
i=l J \i=lj=i+l 
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and 

g(A-9)= £ f(Z,A;8). 

zez n 

We will sometimes use a parameterization 9 = (p, (f>) in which 

Hq = pStj, 

where p > is a nonnegative scalar; is a Euclidean parameter ranging 
over an open set; S 1 ^ is a symmetric matrix in M, KxK constrained to satisfy 
^2ab=i 7T <f>( a ) 7T <l>(.b)S ( i)(a,b) = 1; and the map <fi i-> (tt^, S^) is assumed to 
be smooth. Let A = np. The interpretation of these parameters is that 
A = E[degree], p = P(Ajj = 1), and 

7r(a)7r(6)5(a, 6) = P(Z» = a, Z,- = = !)• 

We will use this parameterization to analyze asymptotic behavior when p = 
p n — > while keeping <j> is fixed, as seems reasonable for sparse network 
settings. 

An interesting class of submodels, discussed by Newman [12], are the 
"degree-corrected" blockmodels with UV classes obtained by considering 
Zi = (Zn, Zi%), for i = 1, . . . ,n, which take values (u, v); where u takes val- 
ues 1, . . . , U with probabilities a±,..., ajj; and given parameters 71, . . . , 7y G 
[0,1], v takes values 71, . . . ,71/ with probabilities . . . , fty. We will assume 
Zn and Zj2 are independent. Additional parameters needed are a U x U 
symmetric matrix of probabilities G. We can now define 

F(Zn = a, Z i2 = 7c, Zji = b, Z j2 = jd\Aij = 1) = a a a b p c f5 d ^ c ^ d G{a, b). 

So although this is a UV blockmodel, it has only U(U + l)/2 + (U - 1) + 
(2V — 1) parameters. Its interpretation is that there are U subblocks, but 
within each subblock, vertices can hierarchically exhibit further affinities to 
vertices both within the same block and other blocks, thus enabling, for 
instance, distinction between vertices of high degree and low degree within 
each block. This distinction is not block-dependent, resulting in a reduction 
of parameters. 

Many variants are of course possible; for example, one can choose to have 
more parameters by having the (u, v) block probabilities be free, so that the 
conditional distribution of Z{ 2 dependent on Zn, or fewer parameters by 
treating a(l), . . . , a(U) as known. 
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2.3. Maximum likelihood and variational estimates. For the complete 
graph blockmodel, maximum likelihood estimation of H and it (or of 8) 
is basically understood. From Eq. (1) it can be seen that the log likelihood 
expression decomposes, so that tt is estimated from Z independently of A, 
and H is estimated from A conditional on Z. We note that it is possible for 
the likelihood to have multiple local optima; in particular this is the case for 
the degree-corrected blockmodel CGM. 

For the GM blockmodel, the maximum likelihood parameter estimate 8 ML 
is given by 

8 ML = argmaxs(A;0) 

= argmax ^ f(Z,A,8). 

zez n 

Multiple local optima in g may exist even if the CGM likelihood function / 
is concave in the appropriate parameterization, as we shall see for the ordi- 
nary unrestricted parameterization. Additionally, the maximum likelihood 
estimate involves a generally intractable marginalization over the latent vari- 
able Z. 

Variational methods attempt to circumvent the second difficulty (while 
accepting the first) by introducing an approximate function J for which 
local optimization is computationally easier. For the GM blockmodel, the 
estimate § VAR is given by 



jVAR _ 

A 



argmax max J(q,0:A) 
e q&V 

argmax max -D(q\\f z \ A .a) + log g(A; 8). 
6 qev ~ 



Here T> is the set of all product distributions over Z n , with densities denoted 
by niLi <&(')• The term D( ■ || • ) is the Kullback-Leibler divergence, and 

fz\A;6 is the conditional density of Z given A, i.e., fz\A-fi{ z ) = g^fg) • The 
Kullback-Leibler divergence is given by 

We note that J simplifies to 

n K 

J(q,6;A) = 5^5^ft(a)[- log qi (a) + log ir g (a)] 

i=l a=l 

n n K K 

+ Y,zZ ^^%(a)9iWK'log^(a,6) + (l-^)log(l-^(a 

i=l j=i+l a=l 6=1 
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This formula indicates that, at least for the complete parameterization, a 
local optimum to J can be tractably computed for moderate n and K using 
the EM algorithm as in [9]. In contrast, optimization of g requires a sum- 
mation over Z n which is generally intractable. However, note that we have 
added n(K — 1) new parameters. 

We remark that exp(J) < g always, due to the nonnegativity of the 
Kullback-Leibler divergence. Intuitively, we expect the variational estimate 
to approximate the maximum likelihood estimate when there exists q G T> 
which is close to fz\A-,e- We also remark that J takes a similar form under 
the general model with continuous Z and parameters (ir, h), suggesting that 
the approximation may have utility in that setting as well. 



3. Results. 



3.1. Asymptotic normality of maximum likelihood under CGM blockmodel. 
We first review the asymptotics of the CGM block model with complete pa- 
rameterization. 

Let w € M. K and [l € R xK be the logit of the 7r and H, given by 



w(a) = log 
fi(a, b) = log 



ma) 



i-£f=iM&) 

H(a,b) 
1-H(a,b) 



a = 1,...,K-1 
a, b = 1, . . . , K. 



Given data A generated by the model, let /o and wq, fio correspond to the 
generative parameter values. For the CGM blockmodel, the log likelihood 
ratio A = log ^ as a function of 9 = (zu, fi) is given by 



K-l 



A(6,Z,A) = J2 



(1=1 



(w(a) — Wo(a))n a — nlog 



1 + Ea=l ^ 



1 + Ha=i e roo(a) 



K K 



+EE 

a=l 6=1 



(/i(a, b) - (io(a, b))O ab - n ab log 



I _|_ e /i(o,6) 
1 _|_ e Ho(a,b) 



where 



O a b = ^ ^ l { Zi = a ' Z i = b } A V' n « = X] l ^ Zi = 

i=l j=i+l i=l 



n a b = '^2 l { Zi = a ' Z i = b }- 

i=l j=i+l 
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This is an exponential family in 8 with gradient VA given by 

(VA) ro ( a ) = n a - rar(a) a=l,...,K -1 
(VA) M ( 0)6 ) = O ab - n ab H(a, b) a = 1, . . . , K, b = a, . . . , K, 

provided standard regularity conditions, e.g., 8 is an interior point of the 
canonical parameter space, and E(VA)(VA) T is of rank K(K + 3)/2 - 1 
uniformly on the set {8 : \8 — 8q\ < M}. 

Lemma 1 (Local asymptotic normality). For the CGM with parameter 
values wq, ho, under standard regularity conditions it holds for any s,t that 

A ( w + -5=,/xo + -?== ) = ^{^^)+sY 1 +tY 2 -\s T ^ 1 s- -t T Z 2 t+o P (l), 
\ V^Po/ 2 2 

where Y\ ~ iV(0, Si) and Y 2 ~ A r (0,S2) and are independent, and Si and 
S2 are functions of wq and hq. 

Proof. Taylor expand and apply central limit theorem. □ 

Lemma 2. Assume that nX —> 00, that < ir a < 1 for a = 1, . . . , K, and 
that S ab > for all a, b. Then if S CGM and tt cgm = (vr(l), . . . , %(K - 1)) 
are maximum likelihood estimates for the CGM with parameters no, So, it 
holds under standard regularity conditions that under (710, /So), 

^(tt^-tto^/V^Si) 

Vn~\(S CGM - So) -)• iV(0,S 2 ). 

Proof. Standard exponential family theory, e.g. [3]. □ 

3.2. Asymptotic normality of maximum likelihood under GM blockmodel. 
Our main result is that if (Z, A) are data from a blockmodel with generative 
parameter 80 £ and 8 ML achieves the local optima of 8 1— > g(A, 8) closest 
to #0) then asymptotic normality of 8 ML holds. 

Definition 1. A classification of Z is any onto mapping Z : {1, . . . , n} — > 
{1, . . . , K} which depends only on A. 

An essentially correct classification to order -y n on C is one such that 
for all 80 £ ©, 

(3) su P P*(Z ^ Z{A)\A)^§- = op( 7 n) 
flee 9\A;B Q ) 

(4) sn V ¥ e {Z,A:Z^Z{A)) = o{ ln ). 

6»g0 
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We will use a result from [1] to establish that an essentially correct clas- 
sification exists for the blockmodel under certain conditions. We defer proof 
of Theorem 1 to a later section. 



Theorem 1. Let Z,A be generated from a blockmodel, parameterizable 
as 9 = (p, 7T0, S/j)), such that <f> is fixed, has no identical columns, and 

p = p n satisfies np n /\ogn — > oo. Let = (^^, lj X 3>, where $ is an open 
compact set. For all c > 0, there exists an essentially correct classification 
to order j n (K) = o{n~ c ) on 0. 



Theorem 2. Suppose an essentially correct classification to order j n (K) 
o(l) on holds, and that (Z, A) are generated from the model with param- 
eter 6q £ 0. Then for all #6 0, 



(5) 



±(A,6) = t(Z,A,0)(l + e n (K,e)) + e n (K,6), 
9o Jo 



where sup^gg e n (-ff, 6>) = op(l). 

Proof of Theorem 2. Let lp. to denote the indicator on the event E 
{Z,A: Z = Z{A)}. 



— (A 
9o 



E 



0o 



E 



00 



t(Z,A,9)\A 
Jo 

£(Z,A,0)1 E \A 
Jo 



Jo 



Uz,a,9)\ e \a 

Jo 
9 



e (E\A)+F e (E\A)^(A,9) 
9o 



f_ 

fo 



f 



(Z,A,9)F do (E\A) 



+ 



Uz,A,6) - -L(Z,A,6)) Fg (E\A)l E +Fg(E\A)-^-(A,e) 
Jo Jo / 9o 



■f (Z, A, 9)(1 - 0P (1)) + 0P (1) + op(1) 
Jo 

where in the last equation we used i-g^i^A) = 1 — op(l) by Eq. (3) (noting 
that f (A,9 ) = 1); used (j^(Z,A,9) - j^(Z,A,9)"j ¥ 6o (E\A)l E = o P {\) 
(since ¥(E) = o(l) by Eq. (4)); and used ¥ e {E\A)^(A,9) 



Eq. (3). The op(l) terms are uniform over all 9 S 0. 



op(l) by 
□ 
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Theorem 3. Assuming the conditions of Theorem 1 and 2, let tt ml , S ml 
and , gCGM fo e ^ e corresr p 0n( n n g maximum likelihood estimates over 

all 9 G 0. It holds that 

jT ML_ fr CGM = 0p(n -l/2 ) 

(6) S ML -S CGM = o P ({n\y l l 2 ). 

Proof of Theorem 3. By Theorems 1 and 2, the maximizers and lo- 
cal maximizers of the CGM and GM likelihoods must be consistent since 
-L(Z,A,0 ) = jL(A,Oo) = 1, so that e(K,Q) is uniformly negligible. It fol- 
lows that £{Z, A, e CGM ) - -L(Z, A, 9 ML ) = o P (l). The conclusion follows 
by Lemma 1 , which states that log has nonvanishing curvature at 9q • D 

We thus have established asymptotic normality not only for the block- 
model under the full parameterization, but also for submodels such as the 
degree-corrected variant. For a general parameterization, we have (j) ML — 
(jpGM _ Qp^T),- 1 / 2 ) generically. If (f> is separable into ((fin^s) such that 
7T = Tt<f> n and S = S^g, and <f> n and (f>g are allowed to vary freely, then 
4>S^ L — 4>s is asymptotically normal with the faster rate VnX, assuming 
standard regularity conditions. Independence of and 4>^f is then also 
valid as well. Also, the same holds for the CGM case. 

3.3. Asymptotic normality of variational estimates under GM blockmodel. 
The same properties that we have established for maximum likelihood esti- 
mates under the GM blockmodel also hold for the more computable varia- 
tional likelihood estimates. 

Theorem 4. Let J (6; A) denote max gg D exp[J(q, 9; A)]. Under the con- 
ditions of Theorem 1 and 2, 

(7) 4lT7rT = A > W 1 + + opQ), 

9{A; Oq) fo 

and hence the conclusions of Theorem 3, apply to tt var ,S var , the varia- 
tional likelihood estimates. 

Proof. As usual, we let go and 9q correspond to the generative model. 
We establish Eq. (7). 

Let 5% denote the indicator function 5g(Z) = 1{Z = Z(A)}. We observe 
that 

exp[J(6 2 ,9;A)] = f(Z(A),A;9), 
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and it thus follows that 



Jo f(Z,A;0 o ) 

exp[J( g ,6>;^)] 
< max 

" i£V f(Z,A;6 ) 

exp[J(q,9;A)\ 



max ■ 



lev g (A;6 )f(Z,A;e )g(A;e y l 
exp[J(q,9;A)] 



max ■ 



i^ g(A;6 )F eo (Z = Z(A)\A) 
exp[J(q,9;A)] 



— 1X1 ctX 

q &> g(A-e Q )(i-o P {i)y 

where the op(l) term converges uniformly over all 9 € 0. Rearranging terms, 
it therefore follows that 

exp[J(g,0;^4)] f ( ^ (A s , mn n ^ 
? e£> 6»o) fo 

= Uz,A;0)(l-o P (l)) 
Jo 

+ (Uz{A), A; 9) - Uz, A; 9)) (1 - o P (l))l E 
V 7o 7o / 

(8) =Uz,A ] 9)(l-op(l)) + o P {\) 

Jo 

As mentioned in Section 2.3, it holds for all (q,9) that exp[J(g, 0; A)] < 
g(A;9). As a result, Theorem 2 implies that 

exp[J( g ,fl;A)] 
max , , — < 



g eX> O ) g(A;6 ) 

(9) =^i|i|I(l + p(l))+ p(l), 

/(A, Z; V ) 

where the op(l) terms converge uniformly over all 9 € 0. 

Combining Eqs. (8) and (9) yields that for all 9 G 0, the quantity 
rnax ge x> ° xP g^.f '^ is upper and lower bounded by -j^(Z, A; 9){1 ± op(l)) + 
op(l), establishing the theorem. □ 

4. Some statistical applications. With these results, we can show 
that some standard inference is valid using the likelihood or variational 
likelihood for blockmodels. 
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We have that Q VAR = (jr VAR , S VAR ) under Pg is asymptotically nor- 
mal with mean 9q and variance covariance matrices given by Theorem 3 and 
Lemma 2. Since 9 i— > S(#) is continuous, we can evidently form tests and con- 
fidence regions based on ^Jn^v AR —■kq) t Yi 1 and ynX(S VAR — 5o) T S 2 
where Si and S 2 are plug-in estimates of Si and S2 using 9 VAR , and A equals 
the average degree in the observed data. The same applies to 9 . 

Under the CGM standard blockmodel with generative parameter 9q, the 
Wilks statistic is given by 

A(Z,A;9 CGM ) = 2logl(Z,A,9 CGM ) -+ X \ {K+Z) 

Jo 2 1 

A consequence of Theorem 2 is that under suitable conditions 

A G (A;9 ML ) = 2log-^-{A;9 ML ) = A(Z,A;9 CGM ) + o P (l), 
9o 

so that the Wilks statistic for the GM and CGM estimates have the same 
asymptotic distribution. To see this, observe that Theorem 2 establishes for 
all 9 G 6 that 

9-(A;9) = *-{Z,A-9)(\ - e n (K, 9)) + e n (K,9), 
9o Jo 

where sup 0e g, e n (K,9) = op(l). By this result, it follows that 

sup log —(A; 9) = sup log ( -L(Z, A; 9){1 - e n (K, 9)) + e n (K, 9) 
eee 9o eeQ \Jo 

< sup log (-L(Z, A; 9){l - e n (K, 9))) + o P (l) 

6»G8 \J0 J 

<sup log^(Z,A;9)+o P (l) 
eee Jo 

By similar arguments, it also follows that 

sup log —(A; 9) > sup log ^(Z,A; 9) - o P (l). 
eee 9o see Jo 

Since A G (A; 9 ML ) = sup ee e log f (A; 9) and A(Z, A; 9 CGM ) = sup ee e log j- Q {Z, A; 1 
we have upper and lower bounded A G (A; 9 ML ) by A(Z, A; § CGM ) ± o P (l). 

A similar result holds for the Wilks statistic of the variational estimate 
qVAR <j o gee mis, we observe that since J(9; A) = max 9G x> exp[J((/, 9; A)] < 
g(A;9), it holds that 

J(9,A) > J(0;A) 



J(9 ,A) ~ g(A;9 ) 



imsart-aos ver. 2012/04/10 file: ims-template.tex date: July 5, 2012 



VARIATIONAL APPROXIMATION FOR STOCHASTIC BLOCKMODELS 11 



so that Theorem 4 implies 



1 >J-(Z,A;6)(l + o P (l))+op(l). 



J(9 ,A) ~ /o 
To upper bound the same quantity, we observe that 
J(9,A) < g(A;9) 



J(0o,A) ~ f(Z,A;9 ) 

9(A;9) 



g(A;9 )f(Z,A;9 )g(A;9 )-i 

9(A;9) 
g(A;9 )(l-o P (l)y 

using similar steps as in the proof of Theorem 4. Thus, the arguments used 
to bound Aq also imply 

A V (9 VAR ) = 2 log ^^.f = A(Z, A; 9 CGM ) + o P (l). 

A third approach to inference, the parametric bootstrap, is also valid for 
9 VAR . The algorithm is 

1. Estimate 9 by 9 VAR 

2. Generate B graphs of size n according to the blockmodel with param- 
eter § VAR , producing (ZJ, A\), . . . , (Z* B , A* B ). 

3. Fit A$, . . . , A* B by variational likelihood to get §Y AR *, 9^ AR *. 

4. Compute the variance-covariance matrix of these B vectors and use it 
as an estimate of the truth, or similarly, use the empirical distribution 
function of the vectors 



Theorem 5. Under the conditions of Theorem 2, the parametric boot- 
strap distribution of y/n(ir VAR — ttq) and VnX(S VAR — So) converges to the 
Gaussian limits given by Lemma 2. 

Proof. Without loss of generality we take B = oo, so that we are asking 
that when the underlying parameter is 9 VAR , the random law of ^/n(Tt VAR * — 
tt var ) and yfn\(S VAR * — S VAR ) converges with Pg probability tending to 1 
to the Gaussian limits of \fn(jt M —itq) and VnX(S CGM — Sq) as generated 
under 6>o- 

Let ftCGM*^gCGM* have the distribution of the CG MLE based on the 
data that we have generated from Pq VA r- By standard exponential theory 
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Fig 1. Quantile-quantile plots checking for normality of the estimation error, for synthet- 
ically generated blockmodel data with n = 300,900, and 1500 (left to right). The plots 
suggest convergence to normality as n increases. 



such as our Lemma 1, we observe that 



(10) 
(11) 



V^(7T CGM * 



IT 



VAR\ 



VAR\ rgVAR 



S VAU ) 



iV(O.Ei) 
iV(0,E 2 ), 



since the convergence is uniform on contiguous neighborhoods of 9q and the 
mapping 9 — > £2(0)) is smooth. As Theorem 4 implies local asymp- 

totic normality, a theorem of Le Cam [14, Corollary 12.3.1] implies that 
Pqvar < Pe with Pq q probability tending to 1 , where < denotes contiguity. 
As a result, Le Cam's first contiguity lemma in conjunction with Theorem 
4 implies that 



V^(n CGM * 
CGM* 



7T 



VAR*\ 



Q(JUM * qVAR* \ _ 

'n\{b - b ) - o P§VAR 
Using this result with Eqs. (10), it follows that 



(!)• 



establishing the theorem. 



Uit VAR * 
VAR* 



~VAR\ 
TV 



s 



VAR\ 



iV(0,£i) 
iV(0,S 2 ), 



□ 



5. Simulations. Blockmodel parameter estimates tt var , S var were es- 
timated variationally on synthetically generated blockmodel data with K = 
3 and parameter values S = |(J + 11 T ), vr = [| | 3], and A = log 2 n, for a 
range of graph sizes n. For each value of re, the estimation error 

gVAR _ s 

from 1000 simulations was recorded. 
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Figure 1 shows quantile-quantile plots for n = (300, 900, 1500), comparing 
the empirical distribution of rescaled errors n\(S VAR — S) T T!2 l (^ VAR ~ &) 
to a chi-squared distribution with 6 degrees of freedom. We observe that the 
distributions grow more similar for large n, as predicted by Lemma 2 and 
Theorem 3. As an alternative measure of quality, we also compared Z to 
estimated labels induced from q; the average fraction of misclassified nodes 
was .11, .05, and .03 

To initialize the variational algorithm, an initial clustering was computed 
using spectral methods [17], giving an initial guess for 5, ir and q. The spec- 
tral method itself utilized the K-means clustering algorithm, which was ini- 
tialized randomly. We observed marked improvement by computing multiple 
initial guesses and using the one for which J was largest. 

Appendix: Proof of Theorem 1. Let Z be defined to be the maxi- 
mizer of max e6 @ f(Z, A; 9) over all Z 6 Z n . The following result from [1] 
then applies to Z(A) : 

Theorem 6 ([1]). Let p n = w(n~ 1 logn) and let S have no identical 
columns. Let C be an open compact set. There exists a sequence b n — > oo 
such that 

supF e (Z / Z(A)) = 0{rT K ). 
flee 



Proof of Theorem 1. We establish that Eqs. (3) and (4) hold. Alge- 

g(A;8 ) 

g(A;9) ¥ e (A,Z^Z(A)) ^ f(Z, A; i 



braic manipulation of Pq(Z ^ Z(A)\A)-^^ produces 



F e (Z^Z(A)\A)- 

g{A;9o) 

For fixed 9 it holds that 



Eo 



\(Z^Z(A)\A) 



9(A;9) 
g(A;9 ) 



g(A;9 ) 



E e 



Z^Z(A) 



g(A;9 



} (A,Z^Z(A)) 
g(A;9 ) 



^ F e (A,Z^Z(A)) 



= P (Z + Z{A)). 

Markov's inequality and Theorem 6 imply that for any fixed 9 and fixed 
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To show uniform convergence over all 9 € 0, let denote a set of points 
which induce an n -3 covering of G in the sup norm. Let 0« = (7rW,ff®) 
denote the ith point in 8". It holds for any 6' = (jr',H') in the ball 
B(9( l > ,n~ 3 ) and for any Z that 



f(Z, A; 6 1 ) _ ( -A- 7t'(Z t ) \ (jjf H'(Zi, Zj) \ A « / 1 - H'(Zi, Z-) \ 1 



f(Z,A;0W: 



< 




[l-H^ZuZj)) 



-, \ n+n(n+l)/2 



= 0(1), 

where Cm is a constant which depends on being strictly bounded away 
from n" 1 log nor 1 in each coordinate which is allowed to vary in because 
<3? is open and compact and p > n _1 logn, this will always hold. 
It follows that for & e B(6^,l/n 3 ), 

Pr(Z ,Z(A)|A)4£n- E %^ 



<?(,4;6>o) 

Z^Z(A) 

0(l)F e(l) (Z^Z(A)\Af { - 
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<0(l)-\e^\-0(n- b ")/e. 
Substituting |<B>( n )| = O ( n 3(#+i)*/2+3(Jir-i)) and reca i ling t h at 7n _^ ^ 



completes the proof, as the right hand side converges to zero for any e > 0. 
This establishes Eq. (3). The remaining equation, Eq. (4), is given by the 
result of Theorem 6. 



References. 

[1] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and 
Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068. 

[2] Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree 
distributions for network models. Ann. Statist. 39 38-59. 

[3] Bickel, P. and Doksum, K. (1977). Mathematical Statistics: Basic ideas and selected 
topics. Holden-Day, San Francisco. 

[4] Bollobas, B., Janson, S. and Riordan, O. (2007). The phase transition in inho- 
mogeneous random graphs. Random Structures Algorithms 31 3-122. 

[5] Celisse, A., Daudin, J. J. and Pierre, L. (2011). Consistency of maximum- 
likelihood and variational estimators in the Stochastic Block Model. Arxiv preprint 
arXiv: 1105.3288. 

[6] Channarond, A., Daudin, J. J. and Robin, S. (2011). Classification and estima- 
tion in the Stochastic Block Model based on the empirical degrees. Arxiv preprint 
arXiv: 11 10.6517. 

[7] Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with 

growing number of classes. Biometrika 99 273-284. 
[8] Coja-Oghlan, A. and Lanka, A. (2008). Partitioning Random Graphs with Gen- 
eral Degree Distributions. In Fifth IFIP International Conference On Theoretical 
Computer Science-TCS 2008 127-141. 
[9] Daudin, J. J., Picard, F. and Robin, S. (2008). A mixture model for random 
graphs. Statist. Comput. 18 173-183. 

[10] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based 
clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. 

[11] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space ap- 
proaches to social network analysis. J. Am. Statist. Assoc. 97 1090-1098. 

[12] Karrer, B. and Newman, M. (2011). Stochastic blockmodels and community struc- 
ture in networks. Phys. Rev. E 83 016107. 

[13] Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., 
Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M. et al. 
(2009). Life in the network: the coming age of computational social science. Science 



imsart-aos ver. 2012/04/10 file: ims-template.tex date: July 5, 2012 



and hence 





□ 



32 3 721. 



16 



BICKEL ET AL. 



[14] Lehmann, E. L. and Romano, J. P. (2005). Testing statistical hypotheses. Springer 
Verlag. 

[15] Proulx, S. R., Promislow, D. E. L. and Phillips, P. C. (2005). Network thinking 
in ecology and evolution. Trends in Ecology & Evolution 20 345-353. 

[16] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high- 
dimensional stochastic blockmodel. Ann. Statist. 39 1878-1915. 

[17] Von Luxburg, U. (2007). A tutorial on spectral clustering. Statist. Comput. 17 
395-416. 

E-MAIL: bickel@stat.berkeley.edu E-MAIL: dchoi@stat.berkeley.edu 

E-MAIL: xiangyuchang@gmail.com E-MAIL: zhanghai@nwu.edu.cn 



imsart-aos ver. 2012/04/10 file: ims-template.tex date: July 5, 2012 



